Versions:
Lemonade Server 10.2.0, developed by AMD, is a lightweight inference server that exposes local large-language-model execution through an OpenAI-compatible REST endpoint, enabling any application that already speaks the OpenAI API to swap cloud endpoints for private on-device compute. By orchestrating LLama, Mistral, Gemma and other popular model families across Radeon GPUs and Ryzen AI NPUs, the program delivers sub-second prompt evaluation and token generation without uploading data to external services, making it suitable for code-assist plugins, chat-front-ends, knowledge-base Q&A tools, transcription enhancers and automated report writers that must remain offline or HIPAA-compliant. The current stable branch, 10.2.0, refines memory mapping, adds INT4 and FP8 quantization paths, and exposes new `/v1/completions` and `/v1/embeddings` routes so that vector databases can also run locally; seventeen numbered releases have appeared since the project’s debut, each expanding the supported hardware roster and model zoo while keeping the same one-click service model. Installation creates a headless Windows service that listens on `localhost:11434` by default, consumes the user’s Hugging Face cache, and automatically selects the fastest available accelerator—whether a discrete RX 7900 XTX, an integrated Radeon 780M, or a Ryzen AI NPU—falling back to CPU inference when necessary. Because the executable is fully portable and writes no keys to the registry, it fits naturally into portable dev-sticks, air-gapped labs, or classroom laptops where network access is restricted yet AI assistance is still desired. Lemonade Server is available for free on get.nero.com, with downloads provided via trusted Windows package sources such as winget, always delivering the latest version and supporting batch installation of multiple applications.
Tags: